In [1]:
import numpy as np
import matplotlib.pyplot as pl
import seaborn as sn
import scipy.stats
%matplotlib inline

This small note shows the importance of priors. Just before entering the asteroid field in the "Empire Strikes Back", C3PO tells Han that the possibilities of succesfully navigating the asteroid field is approximately 1 over 3720.

Let us make the assumption that C3PO is not just saying a small number to avoid Han entering the asteroid field. C3PO is fluent in over 6 million forms of communication, and that makes us believe that he has enough probably access to databases where he has found these numbers.

Let us imagine that from 7442 people trying to navigate the asterior field, only 2 have succeeded, while 7440 are dead. A sensible choice for the likelihood function is the $\mathrm{Beta}(\alpha,\beta)$ distribution, where $\alpha$ is the number of successes and $\beta$ is the number of failures.


In [7]:
alpha = 2.0
beta = 7440.0
x = np.linspace(0,0.005, 100)
p = scipy.stats.beta.pdf(x, alpha, beta)
f, ax = pl.subplots()
ax.plot(x,p)
ax.axvline(1.0/3720.0)


Out[7]:
<matplotlib.lines.Line2D at 0x66a8ed0>

Even though the odds are very small, we know that Han is a badass, so we really trust he will succeed. If not, why seeing the movie if one of the main characters die?

Then, assume that we impose again a Beta prior, assuming that Han has 20000:1 chances of successfully navigating the asteroid field.


In [8]:
alpha = 20000.0
beta = 1.0
x = np.linspace(0.999,1.0, 100)
p = scipy.stats.beta.pdf(x, alpha, beta)
f, ax = pl.subplots()
ax.plot(x,p)


Out[8]:
[<matplotlib.lines.Line2D at 0x6697650>]

Following Bayes theorem, we know that the posterior distribution for the sucess is the product of the likelihood and the prior:

$$ p(sucess|Data) \propto p(Data|sucess) p(sucess) $$

For a $\mathrm{Beta}$ distribution, this product is easy to compute:

$$ \mathrm{Beta}(\alpha_\mathrm{posterior},\beta_\mathrm{posterior}) \propto \mathrm{Beta}(\alpha_\mathrm{likelihood}+\alpha_\mathrm{prior},\beta_\mathrm{likelihood}+\beta_\mathrm{prior}) $$

In [14]:
alpha = 20000.0 + 2.0
beta = 1.0 + 7440.0
x = np.linspace(0.7,0.8, 200)
p = scipy.stats.beta.pdf(x, alpha, beta)
f, ax = pl.subplots()
ax.plot(x,p)


Out[14]:
[<matplotlib.lines.Line2D at 0x70491d0>]

So our posterior for Han's success is a decent $\sim73$%. This shows the importance of priors.


In [ ]: